Data can be accessed using the API and this (py-tata) python library. It can also be retrieved in many other ways using the API itself; however, that isn't documented yet and probably won't be in the foreseable future.
To retrieve data, you must have an API key which can be accessed through your account on the TataAQ website. Here, I will show how to retrieve data for one of the instruments and export it to an external file format (i.e. csv or feather).
In [1]:
import tataaq
YOUR_API_KEY_HERE = ""
api = tataaq.TataAQ(apikey=YOUR_API_KEY_HERE)
# Ping the server to see if we have valid auth credentials
resp = api.ping()
print (resp.status_code)
In [2]:
import pandas as pd
import feather
To retrieve information about a device, you need to know it's Device ID. This can be found by looking at the website.
The device
API endpoint returns a request object. you can learn more about them by looking at the python requests library. All you really need to know will be shown below.
Example: grab information for device_id="EBAM001"
In [3]:
# Request decice information for EBAM001
resp = api.device("EBAM001")
Access the status of the previous request
In [4]:
resp.status_code
Out[4]:
Access the header information
In [5]:
resp.headers
Out[5]:
Access the json information (data)
In [6]:
resp.json()
Out[6]:
In [7]:
# Request the data
resp = api.data("EBAM001")
In [8]:
# Print the meta information
resp.json()['meta']
Out[8]:
We can get the actual data by accesing the "data" key in the resp.json() dictionary:
In [9]:
# print the 0 row
resp.json()['data'][0]
Out[9]:
We can also add keywords to our request. The most useful ones are the following:
per_page
: alter the number of data points sent per page (default is 50)page
: iterate over all pagesfilter
: complex keyword that is very powerful. Examples shown below...The filter
keyword allows you to select by any column in the database. The most useful ones are querying over certain points in time. For example, if we wanted to return all data for EBAM001 after 2017-01-01, we would use the filter keyword as follows:
filter="timestamp,gt,2017-01-01"
We can also join multiple filter arguments together by seperating using a semicolon. For example, to return all data during the month of January 2017:
filter="timestamp,gt,2017-01-01;timestamp,lt,2017-02-01"
See below for working examples
In [10]:
# return data after 2017-01-01
resp = api.data("EBAM001", per_page=100, filter="timestamp,gt,2017-01-01")
resp.json()['meta']
Out[10]:
In [11]:
meta, df = api.data("EBAM001", dataframe=True)
meta
Out[11]:
Let's take a look at our data now:
In [12]:
df.info()
Let's get all data from the EBAM for the year 2017
In [13]:
meta, df = api.data("EBAM001", per_page=10000, filter="timestamp,gt,2017-01-01", dataframe=True)
df.index = df['timestamp_local']
df.info()
In [14]:
# Delete a couple of columns so we can easily peak at the data
del df['instrument']
del df['timestamp']
df.head()
Out[14]:
Once in a DataFrame, it is super easy to export and save your data. I personally recommend using feather, as it is much faster than anything else, and is language agnostic. There are libraries built for R, Python, and Julia, making it easy to analyze your data in any programming language (OSS only, obviously).
To export the dataframe to feather, do the following:
In [15]:
%time feather.write_dataframe(df, "EBAM001_2017_data.feather")
In [ ]: